Skip to main content

Bytecode and the Compiler

Two Functions That Look Different but Produce the Same Bytecode

Start with this puzzle:

import dis

def version_a(x):
"""Explicit multiplication."""
return x * 2

def version_b(x):
"""Left shift - bit trick for multiply by 2."""
return x << 1

def version_c(x):
"""Using a constant expression."""
TWO = 1 + 1
return x * TWO

Intuition says version_b (bit shift) might be faster than version_a (multiply), and that version_c adds an extra variable. Now look at the bytecode:

print("=== version_a ===")
dis.dis(version_a)

print("\n=== version_b ===")
dis.dis(version_b)

print("\n=== version_c ===")
dis.dis(version_c)
=== version_a ===
3 RESUME 0
4 LOAD_FAST 0 (x)
LOAD_CONST 1 (2)
BINARY_OP 5 (*)
RETURN_VALUE

=== version_b ===
7 RESUME 0
8 LOAD_FAST 0 (x)
LOAD_CONST 1 (1)
BINARY_OP 7 (<<)
RETURN_VALUE

=== version_c ===
11 RESUME 0
12 LOAD_CONST 1 (2) ← TWO = 1+1 folded to 2
STORE_FAST 1 (TWO)
13 LOAD_FAST 0 (x)
LOAD_FAST 1 (TWO)
BINARY_OP 5 (*)
RETURN_VALUE

version_a and version_b are structurally identical - same opcode count, same stack depth. version_c has an extra STORE_FAST/LOAD_FAST pair (slower than the others), but the 1 + 1 expression was folded to 2 at compile time - you never pay for the addition at runtime.

This is the bytecode compiler in action. Understanding it lets you predict what "optimisations" actually help and which are illusions.

The Compilation Pipeline

Python source goes through six distinct stages before becoming executable bytecode:

Source text: "def f(x): return x + 1"


┌────────────────┐
│ Tokeniser │ Parser/tokenize.c
│ │ Text → token stream
└────────────────┘ NAME('def') NAME('f') OP('(') NAME('x') OP(')')
│ OP(':') NAME('return') NAME('x') OP('+') NUMBER('1')

┌────────────────┐
│ PEG Parser │ Parser/parser.c
│ (since 3.9) │ Tokens → Concrete Syntax Tree
└────────────────┘ Validates grammar: SyntaxError raised here


┌────────────────┐
│ AST Builder │ Python/ast.c
│ │ CST → Abstract Syntax Tree (drops syntactic details)
└────────────────┘ FunctionDef(name='f', args=[arg(arg='x')],
│ body=[Return(value=BinOp(left=Name('x'), op=Add(),
│ right=Constant(value=1)))])

┌────────────────┐
│ Symbol Table │ Python/symtable.c
│ │ Analyses scopes: which names are local/global/free?
└────────────────┘ 'x' → LOCAL (assigned as argument), 'f' → GLOBAL


┌────────────────┐
│ Compiler │ Python/compile.c
│ │ AST + symbol table → bytecode
└────────────────┘ Emits: RESUME 0, LOAD_FAST 0, LOAD_CONST 1, BINARY_OP 0,
│ RETURN_VALUE

┌────────────────┐
│ PyCodeObject │ Immutable, can be serialised to .pyc
│ │ co_code: b'\x97\x00...' (raw bytes)
└────────────────┘ co_consts: (None, 1)
co_varnames: ('x',)
co_names: ()

Working with the AST

The ast module exposes the compiler's internal representation:

import ast

source = """
def greet(name, times=1):
for _ in range(times):
print(f"Hello, {name}!")
return None
"""

# Parse to AST
tree = ast.parse(source)
print(ast.dump(tree, indent=2))

Output (abbreviated):

Module(
body=[
FunctionDef(
name='greet',
args=arguments(
args=[arg(arg='name'), arg(arg='times')],
defaults=[Constant(value=1)]),
body=[
For(
target=Name(id='_', ctx=Store()),
iter=Call(func=Name(id='range'), args=[Name(id='times')]),
body=[
Expr(value=Call(
func=Name(id='print'),
args=[JoinedStr(...)]))]),
Return(value=Constant(value=None))],
returns=None)],
type_ignores=[])

Walking the AST to collect all function names:

import ast

class FunctionCollector(ast.NodeVisitor):
def __init__(self):
self.functions = []

def visit_FunctionDef(self, node):
self.functions.append(node.name)
self.generic_visit(node) # Continue walking into nested functions

def visit_AsyncFunctionDef(self, node):
self.functions.append(f"async {node.name}")
self.generic_visit(node)

source = """
def outer():
async def inner():
pass
def helper():
pass
"""

tree = ast.parse(source)
collector = FunctionCollector()
collector.visit(tree)
print(collector.functions) # ['outer', 'async inner', 'helper']

Modifying the AST before compilation (a technique used by testing frameworks):

import ast
import types

class AssertRewriter(ast.NodeTransformer):
"""Rewrite 'assert expr' to include the expression's value in the message."""

def visit_Assert(self, node):
# Transform: assert x == y
# Into: assert x == y, f"{x!r} != {y!r}"
if node.msg is None:
# Build a format string from the assertion expression
expr_src = ast.unparse(node.test)
new_msg = ast.Constant(value=f"Assertion failed: {expr_src}")
node.msg = new_msg
return node

source = """
x = 5
y = 6
assert x == y
"""

tree = ast.parse(source)
rewriter = AssertRewriter()
new_tree = rewriter.visit(tree)
ast.fix_missing_locations(new_tree)

# Compile and run the modified AST
code = compile(new_tree, '<string>', 'exec')
try:
exec(code)
except AssertionError as e:
print(f"AssertionError: {e}")
# AssertionError: Assertion failed: x == y

The dis Module: Complete Reference

import dis

def example(items, threshold=10):
result = []
for item in items:
if item > threshold:
result.append(item)
return result

# Basic disassembly
dis.dis(example)

# Structured access to bytecode
print("\n--- Bytecode objects ---")
bc = dis.Bytecode(example)
for instr in bc:
print(f" offset={instr.offset:3d} "
f"opname={instr.opname:<20s} "
f"arg={str(instr.arg):<5s} "
f"argval={instr.argval!r}")

# Get all instructions as a list
instructions = list(dis.get_instructions(example))
print(f"\nTotal instructions: {len(instructions)}")

# Code object details
code = example.__code__
print(f"\nco_varnames: {code.co_varnames}")
print(f"co_consts: {code.co_consts}")
print(f"co_names: {code.co_names}")
print(f"co_argcount: {code.co_argcount}")
print(f"co_flags: {code.co_flags:#010x}")
print(f"co_stacksize: {code.co_stacksize}")

# Disassemble a string directly
dis.dis("x = a + b * c")

Important Opcodes Explained

OpcodeWhat it doesWhen generated
LOAD_FASTPush localsplus[i] onto stackLocal variable read
STORE_FASTPop stack top, store at localsplus[i]Local variable write
LOAD_GLOBALDict lookup in globals then builtinsGlobal name read
STORE_GLOBALStore into globals dictGlobal variable write
LOAD_CONSTPush co_consts[i]Literal (number, string, None)
LOAD_ATTRPop object, push getattr(obj, name)obj.name access
STORE_ATTRPop value and obj, call setattrobj.name = value
BINARY_OPPop two items, apply operator, push resulta + b, a * b, etc.
CALLCall a callable with N args from stackAny function call
RETURN_VALUEPop top of stack, return to callerreturn expr
FOR_ITERCall __next__() on TOS; jump if StopIterationInside a for loop
GET_ITERCall iter() on TOS; push iteratorfor x in iterable:
BUILD_LISTPop N items, construct list, push[a, b, c] literal
BUILD_MAPPop N key-value pairs, build dict{k: v, ...} literal
JUMP_FORWARDUnconditional forward jumpEnd of if branch
POP_JUMP_IF_FALSEPop TOS; jump if it is falsyif condition:
PUSH_EXC_INFOPush exception info for handlerexcept clause
MAKE_FUNCTIONCreate function object from code objectdef f():
LOAD_CLOSURELoad a cell variable for a closureClosure creation
COPY_FREE_VARSCopy free vars from closure into frameClosure execution

.pyc Files: Bytecode Caching

When CPython imports a module, it caches the compiled bytecode in a .pyc file to avoid recompiling on the next run:

mymodule.py (source - human-readable)
__pycache__/
mymodule.cpython-312.pyc (compiled bytecode)

The .pyc file format:

Byte offset Size Content
──────────────────────────────────────────────────────────
0 4 Magic number: encodes Python version + compiler flags
e.g., 0x0D0D0A6F for CPython 3.12
4 4 Bit field: 0 = timestamp-based, 1 = hash-based
8 4 Source timestamp (if timestamp-based) OR source hash
12 4 Source file size in bytes (if timestamp-based)
16 ... marshal-serialised PyCodeObject

The magic number changes with every CPython release that changes bytecode semantics. If the magic number in a .pyc does not match the running interpreter, the .pyc is ignored and the source is recompiled.

import importlib.util
import struct
import marshal
import time

def read_pyc(path):
"""Read and decode a .pyc file."""
with open(path, 'rb') as f:
magic = f.read(4)
bit_field = struct.unpack('<I', f.read(4))[0]

if bit_field & 1: # Hash-based .pyc
source_hash = f.read(8)
print(f"Hash-based .pyc, source hash: {source_hash.hex()}")
else: # Timestamp-based .pyc
timestamp = struct.unpack('<I', f.read(4))[0]
size = struct.unpack('<I', f.read(4))[0]
print(f"Timestamp: {time.ctime(timestamp)}, size: {size}")

print(f"Magic: {magic.hex()}")
code = marshal.load(f)
print(f"Code object: {code}")
print(f"co_filename: {code.co_filename}")
return code

# Find a .pyc in your __pycache__
import os
import sys
import json

# Force compilation of json module
importlib.util.find_spec('json')
import json
json_source = json.__file__
json_pyc = json_source.replace('.py', '') + \
f'.cpython-{sys.version_info.major}{sys.version_info.minor}.pyc'
json_pyc = json_source.replace('json/__init__.py',
f'json/__pycache__/__init__.cpython-{sys.version_info.major}{sys.version_info.minor}.pyc')

if os.path.exists(json_pyc):
code = read_pyc(json_pyc)
print(f"co_names[:5]: {code.co_names[:5]}")

Code Objects: The Compiled Artefact

A PyCodeObject is the result of compiling a function, class, or module. It is immutable, hashable, and can be serialised with marshal. Every function object holds a reference to a code object.

import dis

def outer():
x = 10
def inner(y):
return x + y # 'x' is a free variable (captured from outer)
return inner

code = outer.__code__
print("=== outer ===")
print(f"co_varnames: {code.co_varnames}") # ('x', 'inner') - locals
print(f"co_cellvars: {code.co_cellvars}") # ('x',) - x is captured by inner
print(f"co_freevars: {code.co_freevars}") # () - outer has no free vars
print(f"co_consts: {code.co_consts}") # (None, 10, <code object inner>)
print(f"co_nlocals: {code.co_nlocals}") # 2

inner_code = outer.__code__.co_consts[2] # The nested code object
print("\n=== inner ===")
print(f"co_varnames: {inner_code.co_varnames}") # ('y',)
print(f"co_freevars: {inner_code.co_freevars}") # ('x',) - captured from outer
print(f"co_cellvars: {inner_code.co_cellvars}") # ()

# co_flags encodes various function properties as a bitmask
print(f"\nouter co_flags: {code.co_flags:#010x}")
# Bit 0x04: *args Bit 0x08: **kwargs Bit 0x20: generator Bit 0x100: nested
print(f"inner co_flags: {inner_code.co_flags:#010x}")
# Should have the NESTED flag (0x10) set

Peephole Optimisation and Constant Folding

CPython's compiler applies constant folding and basic dead code elimination. In Python 3.12+, this is done in the AST optimiser pass (rather than a separate peephole pass):

import dis

# 1. Constant folding: arithmetic on literals
def folded_arithmetic():
return 60 * 60 * 24 * 365 # Should be precomputed to 31536000

dis.dis(folded_arithmetic)
# LOAD_CONST 0 (31536000) ← the entire expression is one constant
# RETURN_VALUE

# 2. String concatenation of literals
def folded_string():
return "hello" + " " + "world"

dis.dis(folded_string)
# LOAD_CONST 0 ('hello world') ← folded at compile time
# RETURN_VALUE

# 3. Tuple of constants (used in 'in' membership tests)
def folded_tuple():
x = 5
return x in (1, 2, 3, 4, 5) # Tuple of constants → LOAD_CONST

dis.dis(folded_tuple)
# LOAD_CONST 1 ((1, 2, 3, 4, 5)) ← tuple is a single constant
# CONTAINS_OP

# 4. Dead code after return
def dead_code():
return 42
x = 100 # Never executed
print(x) # Never executed

dis.dis(dead_code)
# LOAD_CONST 1 (42)
# RETURN_VALUE
# ← Dead code after RETURN_VALUE is eliminated

# 5. What does NOT get folded?
def not_folded(x):
return x * (60 * 60) # The literal 3600 is folded, but x * 3600 is not

dis.dis(not_folded)
# LOAD_FAST 0 (x)
# LOAD_CONST 1 (3600) ← 60*60 was folded to 3600
# BINARY_OP 5 (*) ← x * 3600 happens at runtime
# RETURN_VALUE

What CPython does NOT optimise at the bytecode level (that languages like Go or C compilers do):

  • Loop-invariant code motion
  • Inlining function calls
  • Strength reduction (e.g., replacing integer division by powers of 2 with right shifts)
  • Dead store elimination
  • Alias analysis

These are the domain of PyPy's JIT compiler and tools like Cython or Numba, not CPython's ahead-of-time compiler.

Python 3.11+ Specialising Adaptive Interpreter

Python 3.11 introduced a fundamentally new optimisation strategy: specialisation. Instead of one generic opcode per operation, the interpreter observes what types flow through each opcode site and replaces the generic opcode with a specialised variant tuned for those types.

import dis

def add_integers(a, b):
return a + b

# First, look at the "cold" bytecode (before specialisation)
dis.dis(add_integers)
# LOAD_FAST 0 (a)
# LOAD_FAST 1 (b)
# BINARY_OP 0 (+) ← Generic opcode
# RETURN_VALUE

# Call the function many times to trigger specialisation
for _ in range(100):
add_integers(1, 2)

# On Python 3.12+, you can see specialised opcodes:
import opcode
print(opcode.opmap.get('BINARY_OP_ADD_INT', 'not available'))
# In 3.11+: BINARY_OP_ADD_INT is a specialised variant

How specialisation works:

Execution 1: BINARY_OP (+) executes with int + int
→ Increment specialisation counter

Execution 2-7: Same - counter increments

Execution 8 (specialisation threshold ~8):
→ Observe: both operands are always int
→ Replace BINARY_OP with BINARY_OP_ADD_INT

Execution 9+: BINARY_OP_ADD_INT executes:
→ Checks: is left an int? is right an int?
→ If yes: directly calls long_add() - skips type dispatch entirely
→ If no: "deoptimises" back to generic BINARY_OP

Specialised opcode families (Python 3.11-3.13):

Generic OpcodeSpecialised VariantCondition
BINARY_OP +BINARY_OP_ADD_INTBoth operands are int
BINARY_OP +BINARY_OP_ADD_FLOATBoth operands are float
BINARY_OP +BINARY_OP_ADD_UNICODEBoth operands are str
LOAD_GLOBALLOAD_GLOBAL_MODULEFound in module dict
LOAD_GLOBALLOAD_GLOBAL_BUILTINFound in builtins dict
LOAD_ATTRLOAD_ATTR_SLOTAttribute is a __slots__ member
LOAD_ATTRLOAD_ATTR_WITH_HINTInstance __dict__ with cached index
CALLCALL_PY_EXACT_ARGSPython function, exact arg count
CALLCALL_BUILTIN_FASTC builtin with positional args

The specialisation is adaptive - if the types change (a function that was always called with integers starts receiving floats), the specialised opcode "deoptimises" back to the generic form and the counter resets.

Modifying Bytecode at Runtime

Python's types.CodeType is the Python-accessible version of PyCodeObject. It is immutable, but you can create a modified copy and replace a function's __code__ attribute:

import dis
import types

def original(x):
"""Return x * 2."""
return x * 2

# Let's modify this function to return x * 3 instead
# by replacing the constant 2 with 3 in co_consts

original_code = original.__code__
print("Original co_consts:", original_code.co_consts) # (None, 2)

# Create a new code object with modified constants
# In Python 3.8+, use code.replace()
modified_code = original_code.replace(
co_consts=(None, 3) # Replace 2 with 3
)

# Replace the function's code object
original.__code__ = modified_code

# Verify
print(original(5)) # Should now print 15, not 10
dis.dis(original)
# LOAD_FAST 0 (x)
# LOAD_CONST 1 (3) ← Now 3 instead of 2
# BINARY_OP 5 (*)
# RETURN_VALUE

A more practical example - injecting a profiling wrapper at the bytecode level:

import dis
import types
import time

def add_timing(func):
"""Inject a timing wrapper by prepending/appending bytecode."""
# This is illustrative - real profiling uses sys.settrace or
# __code__ manipulation more carefully

# Simpler approach: wrap at Python level
import functools

@functools.wraps(func)
def wrapper(*args, **kwargs):
start = time.perf_counter()
result = func(*args, **kwargs)
elapsed = time.perf_counter() - start
print(f"{func.__name__} took {elapsed*1000:.3f}ms")
return result

return wrapper

# The "real" way: use sys.settrace for profiling (see cProfile)
import cProfile
import pstats
import io

def expensive_computation():
total = 0
for i in range(100_000):
total += i ** 2
return total

pr = cProfile.Profile()
pr.enable()
expensive_computation()
pr.disable()

s = io.StringIO()
ps = pstats.Stats(pr, stream=s).sort_stats('cumulative')
ps.print_stats(5)
print(s.getvalue())

compile() and exec(): Dynamic Code Execution

import ast

# compile() gives you a code object you can exec or inspect
code = compile("x = 1 + 2", "<string>", "exec")
print(type(code)) # <class 'code'>
dis.dis(code)

# exec in a custom namespace
namespace = {}
exec(code, namespace)
print(namespace['x']) # 3

# eval() for expressions
result = eval("2 ** 10 + 1")
print(result) # 1025

# Compile an AST directly - powerful for code generation
tree = ast.parse("result = [x * x for x in range(10)]")
# Modify the AST...
ast.fix_missing_locations(tree)
code = compile(tree, "<generated>", "exec")
ns = {}
exec(code, ns)
print(ns['result']) # [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

# Different modes:
# 'exec' - module or function body (sequence of statements)
# 'eval' - single expression (must return a value)
# 'single' - single interactive statement (prints result like REPL)

# Security: exec with restricted namespaces does NOT provide real sandboxing
# Malicious code can escape any namespace restriction in Python
# Use subprocess/containers for untrusted code execution

Interview Q&A

Q1: What is a Python code object, and how does it differ from a function object?

A code object (PyCodeObject in C, code in Python) is the compiled representation of a function body - produced once at compile time (or import time) and shared across all calls to the function. It contains everything about the function's structure: co_code (bytecode as bytes), co_consts (a tuple of literal values), co_varnames (local variable names in index order), co_names (global/attribute names), co_freevars (captured variable names), co_argcount, co_stacksize, and source location information. A code object is immutable.

A function object (PyFunctionObject in C, function in Python) wraps a code object with the runtime context needed to call it: __globals__ (the module's global namespace dict), __defaults__ (default argument values), __closure__ (a tuple of cell objects for captured variables), __doc__, __name__, and __dict__ (for function attributes). A function object is created each time a def statement executes (not when the module is compiled). Multiple function objects can share the same code object - for example, a function defined inside a loop creates N function objects but only one code object.

Q2: What does the Python compiler optimise? What are its limitations compared to compiled languages?

CPython's compiler performs these optimisations: (1) Constant folding - 60 * 60 * 24 is evaluated at compile time to 86400, stored as a single LOAD_CONST; (2) String literal concatenation - adjacent string literals are folded; (3) Tuple constants - (1, 2, 3) in a membership test is a single constant; (4) Dead code elimination - code after an unconditional return is removed; (5) Boolean constant simplification - not TrueFalse.

The limitations are significant: no loop-invariant code motion, no function inlining, no strength reduction (integer division by power-of-2 → right shift), no alias analysis, no register allocation (the eval loop is always a stack machine), and no type-based optimisation in the standard compiler. CPython has no JIT tier. The specialising adaptive interpreter (3.11+) does opcode specialisation based on observed types at runtime, but it operates at the opcode level, not the native code level. PyPy, Cython, and Numba fill the gap for compute-intensive code.

Q3: What is the specialising adaptive interpreter introduced in Python 3.11? How does it speed up code?

The specialising adaptive interpreter (PEP 659) is a mechanism where individual bytecode instruction sites adapt based on the types they observe at runtime. Each opcode site has a specialisation counter. When the counter reaches a threshold (~8 executions), the interpreter examines the types of the operands and replaces the generic opcode with a specialised variant that is tuned for those types.

For example, BINARY_OP (+) becomes BINARY_OP_ADD_INT when both operands are always integers. The specialised version skips the normal type dispatch (which traverses tp_as_number->nb_add through the type object), instead calling long_add() directly after a fast type check. For LOAD_GLOBAL, the specialised LOAD_GLOBAL_MODULE version caches the dict version number and offset, turning a hash table lookup into a direct array access on cache hits.

If the types change (deoptimisation trigger), the specialised opcode is replaced back with the generic version and the counter resets. This is safe at the cost of occasional deoptimisation overhead. The overall speedup is roughly 10-25% for typical Python code, contributing to the 25% overall speedup claimed for Python 3.11 vs 3.10.

Q4: What is a .pyc file and when is it invalidated?

A .pyc file is a bytecode cache - a serialised PyCodeObject stored on disk so that subsequent imports do not need to recompile the source. It lives in a __pycache__ directory next to the source file, named with the Python version: module.cpython-312.pyc.

The file starts with a 4-byte magic number that encodes the Python version and a compiler version counter. If the magic number does not match the running interpreter, the .pyc is rejected and the source is recompiled.

For source validation, CPython supports two modes: (1) Timestamp-based (default) - the .pyc stores the source file's mtime and size. On import, these are checked against the current mtime/size; if they differ, the source is recompiled. (2) Hash-based (opt-in via --invalidation-mode or the py_compile module) - the .pyc stores a hash of the source content. This is more reliable in environments where file timestamps are unreliable (e.g., CI/CD systems, version-controlled deployments). You can create a hash-based .pyc with py_compile.compile('file.py', invalidation_mode=py_compile.PycInvalidationMode.CHECKED_HASH).

Q5: How do you read and modify Python bytecode at runtime? What are the legitimate use cases?

A function's bytecode is accessible via func.__code__. The dis module provides human-readable disassembly. To modify, create a new code object with code.replace(co_consts=..., co_code=..., ...) (Python 3.8+) and assign it to func.__code__.

Legitimate use cases: (1) Testing frameworks - pytest rewrites assert statements by modifying the AST before compilation to include detailed failure messages; (2) Coverage tools - coverage.py instruments bytecode to track which lines execute; (3) Profilers - performance profilers can inject tracing code; (4) Debuggers - pdb uses sys.settrace which hooks into the eval loop's tracing mechanism; (5) Security scanning - static analysis of .pyc files without needing source; (6) Obfuscation - distributing .pyc files without source (limited protection, but used commercially).

Direct bytecode manipulation is fragile because bytecode format changes between Python versions. The AST-based approach (ast.NodeTransformer + compile()) is more stable. For profiling specifically, sys.setprofile and sys.settrace are the right hooks - they integrate with the eval loop's built-in tracing support without requiring bytecode modification.

© 2026 EngineersOfAI. All rights reserved.